Methods of identifying an organism

ABSTRACT

This disclosure features methods of identifying an organism. In certain embodiments, the invention provides methods of distinguishing virulent and non-virulent strains of  Listeria monocytogenes.

RELATED APPLICATION

This application is a continuation of U.S. nonprovisional patentapplication Ser. No. 13/190,955, filed Jul. 26, 2011, which is acontinuation-in-part of U.S. nonprovisional patent application Ser. No.12/120,586 filed May 14, 2008, which claims priority to and the benefitof U.S. provisional application Ser. No. 61/029,816 filed Feb. 19, 2008,the content of each of which is incorporated by reference herein in itsentirety.

TECHNICAL FIELD

This disclosure relates to methods of identifying an organism, e.g., amicroorganism. The methods can include imaging nucleic acid of theorganism.

BACKGROUND

Physical mapping of genomes, e.g., using restriction endonucleases todevelop restriction maps, can provide accurate information about thenucleic acid sequences of various organisms. Restriction maps of, e.g.,deoxyribonucleic acid (DNA), can be generated by optical mapping.Optical mapping can produce ordered restriction maps by usingfluorescence microscopy to visualize restriction endonuclease cuttingevents on individual labeled DNA molecules.

SUMMARY

The present invention provides methods of identifying an organism, e.g.,a microorganism. The methods include obtaining a restriction map of anucleic acid from an organism and correlating the restriction map of thenucleic acid with a restriction map database, thereby identifying theorganism. With use of a detailed restriction map database, the organismcan be identified and classified not just at a genus and species level,but also at a sub-species (strain), a sub-strain, and/or an isolatelevel. The featured methods offer fast, accurate, and detailedinformation for identifying organisms. The methods can be used in aclinical setting, e.g., a human or veterinary setting; or in anenvironmental or industrial setting (e.g., clinical or industrialmicrobiology, food safety testing, ground water testing, air testing,contamination testing, and the like). In essence, the invention isuseful in any setting in which the detection and/or identification of amicroorganism is necessary or desirable.

This invention also features methods of diagnosing a disease or disorderin a subject by, inter alia, identifying an organism by correlating therestriction map of a nucleic acid from the organism with a restrictionmap database and correlating the identity of the organism with thedisease or disorder.

In one aspect, the invention provides a method of identifying anorganism. The method includes obtaining a restriction digest of anucleic acid sample, imaging the restriction fragments, and comparingthe imaged data to a database. Restriction maps of the invention can beordered by, for example, attaching nucleic acids to a surface,elongating them on the surface and exposing to one or more restrictionendonucleases. Generally, preferred methods of the invention compriseobtaining a nucleic acid sample from an organism; imaging the nucleicacid; obtaining a restriction map of the nucleic acid; and correlatingthe restriction map of the nucleic acid with a restriction map database,thereby identifying the organism.

The detected organism can be a microorganism, a bacterium, a protist, avirus, a fungus, or disease-causing organisms including microorganismssuch as protozoa and multicellular parasites. The nucleic acid can bedeoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or can be a cDNAcopy of an RNA obtained from a sample. The nucleic acid sample includesany tissue or body fluid sample, environmental sample (e.g., water, air,dirt, rock, etc.), and all samples prepared therefrom.

Methods of the invention can further include digesting nucleic acid withone or more enzymes, e.g., restriction endonucleases, e.g., BglII, NcoI,XbaI, and BamHI, prior to imaging. Preferred restriction enzymesinclude, but are not limited to:

AflII ApaLI BglII AflII BglII NcoI ApaLI BglII NdeI AflII BglII MluIAflII BglII PacI AflII MluI NdeI BglII NcoI NdeI AflII ApaLI MluI ApaLIBglII NcoI AflII ApaLI BamHI BglII EcoRI NcoI BglII NdeI PacI BglIIBsu36I NcoI ApaLI BglII XbaI ApaLI MluI NdeI ApaLI BamHI NdeI BglII NcoIXbaI BglII MluI NcoI BglII NcoI PacI MluI NcoI NdeI BamHI NcoI NdeIBglII PacI XbaI MluI NdeI PacI Bsu36I MluI NcoI ApaLI BglII NheI BamHINdeI PacI BamHI Bsu36I NcoI BglII NcoI PvuII BglII NcoI NheI BglII NheIPacI

Imaging ideally includes labeling the nucleic acid. Labeling methods areknown in the art and can include any known label. However, preferredlabels are optically-detectable labels, such as4-acetamido-4′-isothiocyanatostilbene-2,2′disulfonic acid; acridine andderivatives: acridine, acridine isothiocyanate;5-(2′-aminoethyl)aminonaphthalene-1-sulfonic acid (EDANS);4-amino-N-[3-vinylsulfonyl)phenyl]naphthalimide-3,5 disulfonate;N-(4-anilino-1-naphthyl)maleimide; anthranilamide; BODIPY; BrilliantYellow; coumarin and derivatives; coumarin, 7-amino-4-methylcoumarin(AMC, Coumarin 120), 7-amino-4-trifluoromethylcouluarin (Coumaran 151);cyanine dyes; cyanosine; 4′,6-diaminidino-2-phenylindole (DAPI);5′5″-dibromopyrogallol-sulfonaphthalein (Bromopyrogallol Red);7-diethylamino-3-(4′-isothiocyanatophenyl)-4-methylcoumarin;diethylenetriamine pentaacetate;4,4′-diisothiocyanatodihydro-stilbene-2,2′-disulfonic acid;4,4′-diisothiocyanatostilbene-2,2′-disulfonic acid;5-[dimethylamino]naphthalene-1-sulfonyl chloride (DNS, dansylchloride);4-dimethylaminophenylazophenyl-4′-isothiocyanate (DABITC); eosin andderivatives; eosin, eosin isothiocyanate, erythrosin and derivatives;erythrosin B, erythrosin, isothiocyanate; ethidium; fluorescein andderivatives; 5-carboxyfluorescein (FAM),5-(4,6-dichlorotriazin-2-yl)aminofluorescein (DTAF),2′,7′-dimethoxy-4′5′-dichloro-6-carboxyfluorescein, fluorescein,fluorescein isothiocyanate, QFITC, (XRITC); fluorescamine; IR144;IR1446; Malachite Green isothiocyanate; 4-methylumbelliferoneorthocresolphthalein; nitrotyrosine; pararosaniline; Phenol Red;B-phycoerythrin; o-phthaldialdehyde; pyrene and derivatives: pyrene,pyrene butyrate, succinimidyl 1-pyrene; butyrate quantum dots; ReactiveRed 4 (Cibacron® Brilliant Red 3B-A) rhodamine and derivatives:6-carboxy-X-rhodamine (ROX), 6-carboxyrhodamine (R6G), lissaminerhodamine B sulfonyl chloride rhodamine (Rhod), rhodamine B, rhodamine123, rhodamine X isothiocyanate, sulforhodamine B, sulforhodamine 101,sulfonyl chloride derivative of sulforhodamine 101 (Texas Red);N,N,N′,N′tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine;tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid;terbium chelate derivatives; Cy3; Cy5; Cy5.5; Cy7; IRD 700; IRD 800; LaJolta Blue; phthalo cyanine; naphthalo cyanine, BOBO, POPO, YOYO, TOTOand JOJO.

A database for use in the invention can include a restriction mapsimilarity cluster. The database can include a restriction map from atleast one member of the clade of the organism. The database can includea restriction map from at least one subspecies of the organism. Thedatabase can include a restriction map from a genus, a species, astrain, a sub-strain, or an isolate of the organism. The database caninclude a restriction map with motifs common to a genus, a species, astrain, a sub-strain, or an isolate of the organism.

In another aspect, the invention features a method of diagnosing adisease or disorder in a subject, including obtaining a sample suspectedto contain an organism to be detected; (b) imaging a nucleic acid fromthe organism; (c) obtaining a restriction map of the nucleic acid; (d)identifying the organism by correlating the restriction map of thenucleic acid with a restriction map database; and (e) correlating theidentity of the organism with the disease or disorder.

Methods can further include treating a disease or disorder in a subject,including diagnosing a disease or disorder in the subject as describedabove and providing treatment to the subject to ameliorate the diseaseor disorder. Treatment can include administering a drug to the subject.

In one embodiment, a restriction map obtained from a single DNA moleculeis compared against a database of restriction maps from known organismsin order to identify the closest match to a restriction fragment patternoccurring in the database. This process can be repeated iterativelyuntil sufficient matches are obtained to identify an organism at apredetermined confidence level. According to methods of the invention,nucleic acid from a sample are prepared and imaged as described herein.A restriction map is prepared and the restriction pattern is correlatedwith a database of restriction patterns for known organisms. In apreferred embodiment, organisms are identified from a sample containinga mixture of organisms. In a highly-preferred embodiment, methods of theinvention are used to determine a ratio of various organisms present ina sample suspected to contain more than one organism. Moreover, use ofmethods of the invention allows the detection of multiple microorganismsfrom the same sample, either serially or simultaneously.

In use, the invention can be applied to identify a microorganism makingup a contaminant in an environmental sample. For example, methods of theinvention are useful to identify a potential biological hazard in asample of air, water, soil, clothing, luggage, saliva, urine, blood,sputum, food, drink, and others. In a preferred embodiment, methods ofthe invention are used to detect and identify an organism in a sampleobtained from an unknown source. In essence, methods of the inventioncan be used to detect biohazards in any environmental or industrialsetting.

Further aspects and features of the invention will be apparent uponinspection of the following detailed description thereof.

All patents, patent applications, and references cited herein areincorporated in their entireties by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing restriction maps of six isolates of E. coli.

FIG. 2 is a diagram showing restriction maps of six isolates of E. coliclustered into three groups: O157 (that includes O157:H7 and 536), CFT(that includes CFT073 and 1381), and K12 (that includes K12 and 718).

FIG. 3 is a diagram showing common motifs among restriction maps of sixisolates of E. coli.

FIG. 4 is a diagram showing restriction maps of six isolates of E. coli,with the boxes indicating regions common to E. coli.

FIG. 5 is a diagram showing restriction maps of six isolates of E. coli,with the boxes indicating regions that are unique to a particularstrain, namely O157, CFT, or K12.

FIG. 6 is a diagram showing restriction maps of six isolates of E. coli,with the boxes indicating regions unique to each isolate.

FIG. 7 is a tree diagram, showing possible levels of identifying E.coli.

FIG. 8 is a diagram showing restriction maps of a sample (middle map)and related restriction maps from a database.

FIG. 9A is a set of three optical maps showing a whole genome comparisonbetween three Group III strains of L. monocytogenes. The A23 strain ishighly virulent while the BO43 and the 416 strains are of low- ornon-virulence. The maps show that the only difference discovered is a48.3 kb region.

FIG. 9B is a set of three optical maps showing strains BO43 and 416compared to a strain of EGD-e. Similar to the comparison in FIG. 9A, theonly difference between the EDG-e strain and the BO43 and the 416strains is the 48.3 kb insertion in the BO43 and the 416 strains.

FIG. 10 is a set of optical maps comparing the BO43 and the 416 strainsof L. monocytogenes to the A23, EDG-e, 4b F2365, and Clip81459 ′4bCLIP80459′ strains of L. monocytogenes. The hatched segments representregions conserved between the two low virulence strains (BO43 and 416)and the highly virulent strains (A23, EDG-e, 4b F2365, and Clip81459 ′4bCLIP80459′). The blank segments represent the insertion in strains 416and BO43. The boxed region indicates the position of the clpP proteinasegene in the strains.

DETAILED DESCRIPTION

The present disclosure features methods of identifying an organism,e.g., a microorganism. The methods include obtaining a restriction mapof a nucleic acid, e.g., DNA, from an organism and correlating therestriction map of the nucleic acid with a restriction map database,thereby identifying the organism. With use of a detailed restriction mapdatabase that contains motifs common to various groups and sub-groups,the organism can be identified and classified not just at a genus andspecies level, but also at a sub-species (strain), a sub-strain, and/oran isolate level. For example, bacteria can be identified and classifiedat a genus level, e.g., Escherichia genus, species level, e.g., E. colispecies, a strain level, e.g., O157, CFT, and K12 strains of E. coli,and isolates, e.g., O157:H7 isolate of E. coli (as described inExperiment 3B below). The featured methods offer a fast, accurate, anddetailed information for identifying organisms. These methods can beused in a variety of clinical settings, e.g., for identification of anorganism in a subject, e.g., a human or an animal subject.

This disclosure also features methods of diagnosing a disease ordisorder in a subject by, inter alia, identifying an organism viacorrelating the restriction map of a nucleic acid from the organism witha restriction map database, and correlating the identity of the organismwith the disease or disorder. These methods can be used in a clinicalsetting, e.g., human or veterinary setting.

Methods of the invention are also useful for identifying and/ordetecting an organism in food or in an environmental setting. Forexample, methods of the invention can be used to assess an environmentalthreat in drinking water, air, soil, and other environmental sources.Methods of the invention are also useful to identify organisms in foodand to determine a common source of food poisoning in multiple samplesthat are separated in time or geographically, as well as samples thatare from the same or similar batches.

Restriction Mapping

The methods featured herein utilize restriction mapping during bothgeneration of the database and processing of an organism to beidentified. One type of restriction mapping that can be used is opticalmapping. Optical mapping is a single-molecule technique for productionof ordered restriction maps from a single DNA molecule (Samad et al.,Genome Res. 5:1-4, 1995). During this method, individual fluorescentlylabeled DNA molecules are elongated in a flow of agarose between acoverslip and a microscope slide (in the first-generation method) orfixed onto polylysine-treated glass surfaces (in a second-generationmethod). Id. The added endonuclease cuts the DNA at specific points, andthe fragments are imaged. Id. Restriction maps can be constructed basedon the number of fragments resulting from the digest. Id. Generally, thefinal map is an average of fragment sizes derived from similarmolecules. Id. Thus, in one embodiment of the present methods, therestriction map of an organism to be identified is an average of anumber of maps generated from the sample containing the organism.

Optical mapping and related methods are described in U.S. Pat. No.5,405,519, U.S. Pat. No. 5,599,664, U.S. Pat. No. 6,150,089, U.S. Pat.No. 6,147,198, U.S. Pat. No. 5,720,928, U.S. Pat. No. 6,174,671, U.S.Pat. No. 6,294,136, U.S. Pat. No. 6,340,567, U.S. Pat. No. 6,448,012,U.S. Pat. No. 6,509,158, U.S. Pat. No. 6,610,256, and U.S. Pat. No.6,713,263, each of which is incorporated by reference herein. OpticalMaps are constructed as described in Reslewic et al., Appl EnvironMicrobiol. 2005 September; 71 (9):5511-22, incorporated by referenceherein. Briefly, individual chromosomal fragments from test organismsare immobilized on derivatized glass by virtue of electrostaticinteractions between the negatively-charged DNA and thepositively-charged surface, digested with one or more restrictionendonuclease, stained with an intercalating dye such as YOYO-1(Invitrogen) and positioned onto an automated fluorescent microscope forimage analysis. Since the chromosomal fragments are immobilized, therestriction fragments produced by digestion with the restrictionendonuclease remain attached to the glass and can be visualized byfluorescence microscopy, after staining with the intercalating dye. Thesize of each restriction fragment in a chromosomal DNA molecule ismeasured using image analysis software and identical restrictionfragment patterns in different molecules are used to assemble orderedrestriction maps covering the entire chromosome.

Restriction Map Database

The database(s) used with the methods described herein can be generatedby optical mapping techniques discussed supra. The database(s) cancontain information for a large number of isolates, e.g., about 200,about 300, about 400, about 500, about 600, about 700, about 800, about900, about 1,000, about 1,500, about 2,000, about 3,000, about 5,000,about 10,000 or more isolates. In addition, the restriction maps of thedatabase contain annotated information (a similarity cluster) regardingmotifs common to genus, species, sub-species (strain), sub-strain,and/or isolates for various organisms. The large number of the isolatesand the information regarding specific motifs allows for accurate andrapid identification of an organism.

The restriction maps of the database(s) can be generated by digesting(cutting) nucleic acids from various isolates with specific restrictionendonuclease enzymes. Some maps can be a result of digestion with oneendonuclease. Some maps can be a result of a digest with a combinationof endonucleases, e.g., two, three, four, five, six, seven, eight, nine,ten or more endonucleases. The exemplary endonucleases that can be usedto generate restriction maps for the database(s) and/or the organism tobe identified include: BglII, NcoI, XbaI, and BamHI. Non-exhaustiveexamples of other endonucleases that can be used include: AluI, ClaI,DpnI, EcoRI, HindIII, KpnI, PstI, SacI, and SmaI. Yet other restrictionendonucleases are known in the art.

Map alignments between different strains are generated with a dynamicprogramming algorithm which finds the optimal alignment of tworestriction maps according to a scoring model that incorporates fragmentsizing errors, false and missing cuts, and missing small fragments (SeeMyers et al., Bull Math Biol 54:599-618 (1992); Tang et al., J ApplProbab 38:335-356 (2001); and Waterman et al., Nucleic Acids Res12:237-242). For a given alignment, the score is proportional to the logof the length of the alignment, penalized by the differences between thetwo maps, such that longer, better-matching alignments will have higherscores.

To generate similarity clusters, each map is aligned against every othermap. From these alignments, a pair-wise alignment analysis is performedto determine “percent dissimilarity” between the members of the pair bytaking the total length of the unmatched regions in both genomes dividedby the total size of both genomes. These dissimilarity measurements areused as inputs into the agglomerative clustering method “Agnes” asimplemented in the statistical package “R”. Briefly, this clusteringmethod works by initially placing each entry in its own cluster, theniteratively joining the two nearest clusters, where the distance betweentwo clusters is the smallest dissimilarity between a point in onecluster and a point in the other cluster.

Organisms To Be Identified

Various organisms, e.g., viruses, and various microorganisms, e.g.,bacteria, protists, and fungi, can be identified with the methodsfeatured herein. In one embodiment, the organism's genetic informationis stored in the form of DNA. The genetic information can also be storedas RNA.

The sample containing the organism to be identified can be a humansample, e.g., a tissue sample, e.g., epithelial (e.g., skin), connective(e.g., blood and bone), muscle, and nervous tissue, or a secretionsample, e.g., saliva, urine, tears, and feces sample. The sample canalso be a non-human sample, e.g., a horse, camel, llama, cow, sheep,goat, pig, dog, cat, weasel, rodent, bird, reptile, and insect sample.The sample can also be from a plant, water source, food, air, soil,plants, or other environmental or industrial sources.

Identifying an Organism

The methods described herein, i.e., methods of identifying an organism,diagnosing a disease or disorder in a subject, determining antibioticresistance of an organism, determining an antibiotic resistance profileof a bacterium, and determining a therapeutically effective antibioticto administer to a subject, and treating a subject, include correlatingthe restriction map of a nucleic acid of an organism with a restrictionmap database. The methods involve comparing each of the raw singlemolecule maps from the unknown sample (or an average restriction map ofthe sample) against each of the entries in the database, and thencombining match probabilities across different molecules to create anoverall match probability.

In one embodiment of the methods, entire genome of the organism to beidentified can be compared to the database. In another embodiment,several methods of extracting shared elements from the genome can becreated to generate a reduced set of regions of the organism's genomethat can still serve as a reference point for the matching algorithms.

As discussed above and in the Examples below, the restriction maps ofthe database can contain annotated information (a similarity cluster)regarding motifs common to genus, species, sub-species (strain),sub-strain, and/or isolates for various organisms. Such detailedinformation would allow identification of an organism at a sub-specieslevel, which, in turn, would allow for a more accurate diagnosis and/ortreatment of a subject carrying the organism.

In another embodiment, methods of the invention are used to identifygenetic motifs that are indicative of an organism, strain, or condition.For example, methods of the invention are used to identify in an isolateat least one motif that confers antibiotic resistance. This allowsappropriate choice of treatment without further cluster analysis.

Applications

The methods described herein can be used in a variety of settings, e.g.,to identify an organism in a human or a non-human subject, in food, inenvironmental sources (e.g., food, water, air), and in industrialsettings. The featured methods also include methods of diagnosing adisease or disorder in a subject, e.g., a human or a non-human subject,and treating the subject based on the diagnosis. The method includes:obtaining a sample comprising an organism from the subject; imaging anucleic acid from the organism; obtaining a restriction map of saidnucleic acid; identifying the organism by correlating the restrictionmap of said nucleic acid with a restriction map database; andcorrelating the identity of the organism with the disease or disorder.

As discussed above, various organisms can be identified by the methodsdiscussed herein and therefore various diseases and disorders can bediagnosed by the present methods. The organism can be, e.g., a cause, acontributor, and/or a symptom of the disease or disorder. In oneembodiment, more than one organism can be identified by the methodsdescribed herein, and a combination of the organisms present can lead todiagnosis. Skilled practitioners would be able to correlate the identityof an organism with a disease or disorder. For example, the following isa non-exhaustive list of some diseases and bacteria known to cause them:tetanus—Clostridium tetani; tuberculosis—Mycobacterium tuberculosis;meningitis—Neisseria meningitidis; botulism—Clostridium botulinum;bacterial dysentry—Shigella dysenteriae; lyme disease—Borreliaburgdorferi; gasteroenteritis—E. coli and/or Campylobacter spp.; foodpoisoning—Clostridium perfringens, Bacillus cereus, Salmonellaenteriditis, and/or Staphylococcus aureus. These and other diseases anddisorders can be diagnosed by the methods described herein.

Once a disease or disorder is diagnosed, a decision about treating thesubject can be made, e.g., by a medical provider or a veterinarian.Treating the subject can involve administering a drug or a combinationof drugs to ameliorate the disease or disorder to which the identifiedorganism is contributing or of which the identified organism is a cause.Amelioration of the disease or disorder can include reduction in thesymptoms of the disease or disorder. The drug administered to thesubject can include any chemical substance that affects the processes ofthe mind or body, e.g., an antibody and/or a small molecule, The drugcan be administered in the form of a composition, e.g., a compositioncomprising the drug and a pharmaceutically acceptable carrier. Thecomposition can be in a form suitable for, e.g., intravenous, oral,topical, intramuscular, intradermal, subcutaneous, and analadministration. Suitable pharmaceutical carriers include, e.g., sterilesaline, physiological buffer solutions and the like. The pharmaceuticalcompositions may be additionally formulated to control the release ofthe active ingredients or prolong their presence in the patient'ssystem. Numerous suitable drug delivery systems are known for thispurpose and include, e.g., hydrogels, hydroxmethylcellulose,microcapsules, liposomes, microemulsions, microspheres, and the like.Treating the subject can also include chemotherapy and radiationtherapy.

Identifying Virulent and Low- or Non-Virulent Stains of Listeriamonocytogenes

Listeria monocytogenes is an organism that causes a Listeriosisinfection, which is one of the leading causes of death from food-bornepathogens, especially in pregnant women, newborns, elderly, andimmuno-compromised individuals. It is found in environments such asdecaying vegetable matter, sewage, water, and soil, and it can surviveextremes of both temperatures (from about 1° C. to about 45° C.) andsalt concentration. Due to these characteristics, L. monocytogenes is anextremely dangerous food-born pathogen, especially in food that is notreheated. The bacterium can spread from an infection site in theintestines to the central nervous system and, in the case of a pregnantwoman, to the fetal-placental unit.

Meningitis (inflammation of the membrane surrounding spinal cord andbrain), gastroenteritis (inflammation of mucous membranes of stomach andintestine), and septicemia (systemic spread of bacteria and toxins inthe blood) can result from infection. This organism is enteroinvasive,and utilizes an actin-based motility system by using a surface protein,ActA, that promotes actin polymerization, to spread intercellularlyusing the polymerized cytoskeletal protein as a motor. There are 13serovars associated with L. monocytogenes, and the serovar 4b strainsare more commonly associated with invasive disease.

Methods of the invention were used to obtain optical maps of strains ofL. monocytogenes that are known to be virulent or low- or non-virulent.Optical maps were obtained for the following strains: L. monocytogenesBO43; L. monocytogenes 416; L. monocytogenes strain 4b F2365; L.monocytogenes strain EGD-e; L. monocytogenes strain A23; L.monocytogenes Clip81459 ′4b CLIP80459′. L. monocytogenes strain 4b F2365is a highly virulent strain of L. monocytogenes that is of serotype 4b.This strain was isolated in 1985 in California, USA, during an outbreakof listeriosis among patients with AIDS. The strain is of serotype 4band was isolated from a cheese product that caused the outbreak. L.monocytogenes Clip81459 ′4b CLIP80459′ and L. monocytogenes (A23) arealso highly virulent strains of L. monocytogenes of serotype 4b. L.monocytogenes strain EGD-e is a highly virulent strain of L.monocytogenes. This strain has numerous pathogenicity islands and genes,and is serovar 1/2a. This strain is derived from the strain EGD that wasused in studies of cell-mediated immunity. L. monocytogenes BO43; L.monocytogenes 416 are low- and/or non-virulent strains of L.monocytogenes.

Upon comparison of the optical maps of these strains of L.monocytogenes, it was found that the two low- and/or non-virulentstrains of L. monocytogenes (BO43 and 416) contain a 48.3 kb insertionin their genomes when compared to the highly virulent strain of L.monocytogenes (FIGS. 9-10). It was also determined that this is the solearchitectural difference between these two low- and/or non-virulencestrains and the highly virulent strains (FIGS. 9-10).

Without being limited by any theory or particular mechanism of action,it is believed that the functional cause of the loss of virulence inthese strains is due to insertional inactivation of a virulence genesuch as clpP (product ATP-dependent Clp protease proteolytic subunit)which is found in the same approximate location as the insertion site inthe sequenced virulent L. monocytogenes strain EDG-e. The ATP-dependentcaseinolytic proteases (clp) are important in resistance againstenvironmental stresses, antibiotic treatments and host immune defensesfor a number of pathogenic bacteria. clpP is the proteolytic subunit,whilst clpA acts both as a chaperone and as an ATPase driving thedegradation of damaged or mis-made proteins.

Pathogenic organisms invading a host are exposed to host immune cellsand, upon treatment with antibiotics, conditions likely to cause proteindamage. Disruption of the clpP gene in strains BO43 and 416 by theinserted DNA may diminish the protein-repair capability of theseorganisms, resulting in less virulent strains.

These data show that this 48.3 kb region may be used as a marker todistinguish virulent and low- and/or non-virulent strains of L.monocytogenes. Methods of the invention use optical mapping as describedabove to distinguish between virulent and low- or non-virulent strainsof L. monocytogenes. Methods of the invention involve obtaining anucleic acid from L. monocytogenes; preparing an optical map of one ormore restriction digests of the obtained nucleic acid; and detecting aninsertion in a clpP gene of a genome of L. monocytogenes, whereinpresence of the insertion is indicative of a low- or non-virulent strainof L. monocytogenes. Therefore, methods of the invention may be used todetect the presence or absence of this insertion in strains of L.monocytogenes to predict whether a strain of L. monocytogenes is likelyto be virulent or low- or non-virulent.

The following examples provide illustrative embodiments of the presentmethods and should not be treated as restrictive.

EXAMPLE 1 Microbial Identification Using Optical Mapping

Microbial identification (ID) generally has two phases. In the first,DNA from a number of organisms are mapped and compared against oneanother. From these comparisons, important phenotypes and taxonomy arelinked with map features. In the second phase, single moleculerestriction maps are compared against the database to find the bestmatch.

Database Building and Annotation

Maps sufficient to represent a diversity of organisms, on the basis ofwhich it will be possible to discriminate among various organisms, aregenerated. The greater the diversity in the organisms in the database,the more precise will be the ability to identify an unknown organism.Ideally, a database contains sequence maps of known organisms at thespecies and sub-species level for a sufficient variety of microorganismsso as to be useful in a medical or industrial context. However, theprecise number of organisms that are mapped into any given database isdetermined at the convenience of the user based upon the desired use towhich the database is to be put.

After sufficient number of microorganisms are mapped, a map similaritycluster is generated. First, trees of maps are generated. After the treeconstruction, various phenotypic and taxonomic data are overlaid, andregions of the maps that uniquely distinguish individual clades from therest of the populations are identified. The goal is to find particularclades that correlate with phenotypes/taxonomies of interest, which willbe driven in part through improvements to the clustering method.

Once the clusters and trees have been annotated, the annotation will beapplied back down to the individual maps. Additionally, if needed, thedatabase will be trimmed to include only key regions of discrimination,which may increase time performance.

Calling (Identifying) an Unknown

One embodiment of testing the unknowns involves comparing each of theraw single molecule maps from the unknown sample against each of theentries in the database, and then combining match probabilities acrossdifferent molecules to create an overall match probability.

The discrimination among closely related organisms can be done by simplypicking the most hits or the best match probability by comparing dataobtained from the organism to data in the database. More precisecomparisons can be done by having detailed annotations on each genomefor what is a discriminating characteristic of that particular genomeversus what is a common motif shared among several isolates of the samespecies. Thus, when match scores are aggregated, the level ofcategorization (rather than a single genome) will receive a probability.Therefore, extensive annotation of the genomes in terms of what is adefining characteristic and what is shared will be required.

In one embodiment of the method, entire genomes will be compared to allmolecules. Because there will generally be much overlap of maps within aspecies, another embodiment can be used. In the second embodiment,several methods of extracting shared elements from the genome will becreated to generate a reduced set of regions that can still serve as areference point for the matching algorithms. The second embodiment willallow for streamlining the reference database to increase systemperformance.

EXAMPLE 2 Using Multiple Enzymes for Microbial Identification

In one embodiment, the single molecule restriction maps from each of theenzymes will be compared against the database described in Example 1independently, and a probable identification will be called from eachenzyme independently. Then, the final match probabilities will becombined as independent experiments. This embodiment will provide somebuilt-in redundancy and therefore accuracy for the process.

Introduction

In general, optical mapping can be used within a specific range ofaverage fragment sizes, and for any given enzyme there is considerablevariation in the average fragment size across different genomes. Forthese reasons, it typically will not be optimal to select a singleenzyme for identification of clinically-relevant microbes. Instead, asmall set of enzymes will be chosen to optimize the probability that forevery organism of interest, there will be at least one enzyme in thedatabase suitable for mapping.

Selection Criteria

A first step in the selection of enzymes was the identification of thebacteria of interest. These bacteria were classified into two groups:(a) the most common clinically interesting organisms and (b) otherbacteria involved in human health. The chosen set of enzymes must haveat least one enzyme that cuts each of the common clinically interestingbacteria within the range of average fragment sizes suitable fordetailed comparisons of closely related genomes (about 6-13 kb).Additionally, for the remaining organisms, each fragment must be withinthe functional range for optical mapping (about 4-20 kb). These limitswere determined through mathematical modeling, directed experiments, andexperience with customer orders. Finally, enzymes that have already beenused for Optical Mapping were selected.

Suggested Set

Based upon the above criteria, the preliminary set consisted of theenzymes BglII, NcoI, and XbaI, which have been used for optical mapping.There are 28 additional sets that cover the key organisms with knownenzymes, so in the event that this set is not adequate, thesealternatives will be utilized (data not shown).

Final Steps

Because the analysis in Experiment 2 is focused on the sequencedgenomes, prior to full database production, this set of enzymes will betested against other clinically important genomes, which will be part ofthe first phase of the proof of principle study.

EXAMPLE 3 Identification of E. coli

A. In one embodiment of a microbial identification method, nucleic acidsof between about 500 and about 1,000 isolates will be optically mapped.Then, unique motifs will be identified across genus, species, strains,substrains, and isolates. To identify a sample, single nucleic acidmolecules of the sample will be aligned against the motifs, and p-valuesassigned for each motif match. The p-values will be combined to findlikelihood of motifs. The most specific motif will give theidentification.

B. The following embodiment illustrates a method of identifying E. colidown to an isolate level. Restriction maps of six E. coli isolates wereobtained by digesting nucleic acids of these isolates with BamHIrestriction enzyme. FIG. 1 shows restriction maps of these six E. coliisolates: 536, O157:H7 (complete genome), CFT073 (complete genome),1381, K12 (complete genome), and 718. As shown in FIG. 2, the isolatesclustered into three sub-groups (strains): O157 (that includes O157:H7and 536), CFT (that includes CFT073 and 1381), and K12 (that includesK12 and 718).

These restriction maps provided multi-level information regardingrelation of these six isolates, e.g., showed motifs that are common toall of the three sub-groups (see, FIG. 3) and regions specific to E.coli (see, boxed areas in FIG. 4). The maps were also able to showregions unique to each strain (see, boxed areas in FIG. 5) and regionsspecific to each isolate (see boxed regions in FIG. 6).

This and similar information can be stored in a database and used toidentify bacteria of interest. For example, a restriction map of anorganism to be identified can be obtained by digesting the nucleic acidof the organism with BamHI. This restriction map can be compared withthe maps in the database. If the map of the organism to be identifiedcontains motifs specific to E. coli, to one of the sub-groups, to one ofthe strains, and/or to a specific isolate, the identity of the organismcan be obtained by correlating the specific motifs. FIG. 6 shows adiagram to illustrate the possibilities of traversing variable lengthsof a similarity tree.

C. The following example illustrates identifying a sample as an E. colibacterium. A sample (sample 28) was digested with BamHI and itsrestriction map obtained (see FIG. 8, middle restriction map). Thissample was aligned against a database that contained various E. coliisolates. The sample was found to be similar to four E. coli isolates:NC 002695, AC 000091, NC 000913, and NC 002655. The sample was thereforeidentified as E. coli bacterium that is most closely related to the AC000091 isolate.

The embodiments of the disclosure may be carried out in other ways thanthose set forth herein without departing from the spirit and scope ofthe disclosure. The embodiments are, therefore, to be considered to beillustrative and not restrictive. References and citations to otherdocuments, such as patents, patent applications, patent publications,journals, books, papers, web contents, have been made throughout thisdisclosure. All such documents are hereby incorporated herein byreference in their entirety for all purposes.

1-9. (canceled)
 10. A method of distinguishing virulent and low- ornon-virulent strains of Listeria monocytogenes, the method comprising:obtaining a nucleic acid from L. monocytogenes; preparing an optical mapof one or more restriction digests of the obtained nucleic acid; anddetecting an insertion that is at least 1 kb in size in a clpP gene of agenome of L. monocytogenes, wherein the insertion results ininactivation of the clpP gene, which is indicative of a low- ornon-virulent strain of L. monocytogenes.
 11. The method according toclaim 10, wherein the nucleic acid is obtained from at least one sampletype selected from the group consisting of: a food sample, anenvironmental sample, and a human tissue or body fluid sample.
 12. Themethod according to claim 11, wherein the environmental sample isselected from the group consisting of water, soil, sewage, and decayingvegetable matter.
 13. The method according to claim 11, wherein thetissue or body fluid is from a human having a disease selected from thegroup consisting of meningitis, gastroenteritis, and septicemia.
 14. Themethod according to claim 10, wherein the insertion is a 48.3 kbinsertion.
 15. The method according to claim 10, wherein the L.monocytogenes is classified as a serotype 4b strain.
 16. The methodaccording to claim 10, wherein the L. monocytogenes is classified as aserotype 1/2a strain.
 17. The method according to claim 10, wherein theL. monocytogenes low- or non-virulent strain is selected from the groupconsisting of L. monocytogenes BO43 or L. monocytogenes
 416. 18. Themethod according to claim 10, wherein the L. monocytogenes virulentstrain is selected from the group consisting of L. monocytogenes strain4b F2365, L. monocytogenes strain EGD-e, L. monocytogenes strain A23, L.monocytogenes Clip81459 ′4b CLIP80459′.